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I.  INTRODUCTION 


A.  PURPOSE 

'  The  purpose  of  this  paper  Is  to  provide  an  investigative  methodology  and 
useful  statistical  analysis  plan  to  support  the  Test  Program  Definition  (TPD) 

recently  developed  by  the  Joint  Forward  Area  Air  Defense  (JFAAD)  Test  Force  , 

C 

4J&e#T"T7T^ The  two  primary  test  Force  objectives  are  to  improve  forward  area 
air  defense  performance,  reduce  friendly  air  casualties  due  to  ground-based 
air  defense  and  also  to  identify  joint  tactical,  doctrinal,  and  procedural 
changes  which  assist  in  attaining  these  objectives.  These  objectives 
necessitate  an  in-depth  examination  of  three  major  issues: 

1. N  To  what  degree  do  the  collective  means  of  aircraft 
identification  influence  the  effectiveness  of  the  forward 
area  air  defense  systems? 

2.  To  what  degree  do  projected  C^I  capabilities  support 
JFAAD  elements? 

3.  How  does  airspace  management  and  control  affect  the 
mission  accomplishment  of  FAAD  systems  and  friendly  air¬ 
craft? 

Detailed  "Patterns  of  Analysis"  for  each  of  these  issues  are  contained  in  the 
TPD. 

R.  OBJECTIVES 

The  primary  goal  of  this  effort  Is  the  development  of  an  investigative 
methodology  that  contains  specific  statistical  analysis  plans  for  each  major 
issue.  It  is  desired  that  such  plans  he: 


1.  Theoretically  plausible  -  Analysis  plans  should  he 
based  upon  sound  theoretical  considerations  that  provide 
not  only  credibility  of  statistical  test  results,  but  also 


provide  for  maximum  power  in  announcing  such  results. 

2.  Flexible  -  The  analysis  plan  should  be  sufficiently 
flexible  to  provide  multiple  statistical  procedures  for 
consideration  yet  allow  the  final  choice  of  a  specific 
statistical  test,  or  tests,  to  he  suggested  by  the  actual 
data  generated  from  the  testbed  simulation. 

After  developing  an  analysis  approach,  including  comparisons  of  its  advantages 
and  disadvantages  with  other  statistical  approaches,  the  statistical  plans 
will  he  Implemented  by  performing  a  "pilot  test"  on  manually  developed, 

"dummy"  data.  It  is  anticipated  that  such  an  effort  will  provide  useful 
results  for  the  JFAAH  Test  Force  and  also  provide  some  guidance  in  identifying 
options  and  potential  problem  areas  as  the  Analysis  Directorate  develops, 
refines,  and  executes  its  Master  Analysis  Plan.  At  best  or  at  worst, 
respectively,  it  is  offered  as  a  means  of  providing  either? 

1.  The  primary  statistical  analysis  methodology  to 
support  the  JFAAD  TPD,  or 

2.  A  theoretically  valid  and  reasonable  alternative, 
available  as  a  back-up  or  adjunct  analysis  plan  if  needed. 

C.  CONTENT 

The  development  of  the  statistical  analysis  methodology  has  been 


partitioned  as  follows: 


1.  Theoretical  issues  involved  in  statistical  analysis 
are  discussed  in  the  context  of  the  JPAAD  Issues: 

-  Type  I  and  Type  II  errors  are  explained, 

-  Random  sampling  with  and  without  replacement  in  a 
discrete  counting  process  are  reviewed, 

-  The  two  major  categories  of  statistical  tests 
(Parametric  and  Nonpararaetric)  are  compared  and 
contrasted, 

-  The  importance  of  sample  size  and  its  relation  to 
power  efficiency  and  asymptotic  relative  efficiency 
(ARE)  is  reviewed. 

2.  A  general  statistical  analysis  methodology  is  proposed 
in  consonance  with  the  theoretical  Issues  previously 
emphasized.  The  advantages  and  disadvantages  of  the 
methodology  are  cited,  detailed  statistical  analysis 
plans  are  then  outlined  in  flow  diagram  form  to  support 
the  patterns  of  analysis  for  each  of  the  three  test 
issues: 

-  Aircraft  ID  Statistical  Analysis  Plan, 

-  C^I  Statistical  Analysis  Plan, 

-  Airspace  Management  Statistical  Analysis  Plan. 

3.  Conclusions  are  drawn  from  this  study  effort  and  re¬ 
commendations  offered. 


II.  STATISTICAL  ANALYSIS  METHODOLOGY 
A.  SOME  THEORETICAL  CONSIDERATIONS 

1.  Random  Sampling  With  and  Without  Replacement. 

One  of  the  difficulties  encountered  in  developing  a  statistical 
analysis  methodology  to  support  the  JFAAD  TPD  is  the  form  of  the  underlying 
distribution  of  data.  Classical  parametric  statistics  assume  the 
distributional  form  of  the  random  variable  to  be  both  continuous  and  normal. 
However,  in  most  cases  the  data  will  not  be  in  this  form  but  rather  is 
obtained  from  a  discrete  counting  process  generated  by  a  dichotomous  (binary) 
random  variable  (e.g.,  number  of  detections,  number  of  engagements,  etc). 

While  one  might  immediately  surmise  the  applicability  of  the  binomial 
distribution  as  a  valid  probability  model,  consideration  must  be  given  to  the 
sampling  process.  In  examining  the  early  warning  MOP,  for  example,  the  number 
of  detections(c)  out  of  n  engagement  opportunities  may  be  viewed  as  sampling 
with  replacement.  In  this  Instance,  use  of  the  binomial  distribution  to  model 
number  of  detections  (X)  can  possibly  be  justified  on  theoretical  grounds, 
assuming  that  detection  from  one  weapon  system  (or  crew)  to  the  next  can  be 
considered  independent: 

P(X  -  c)  -  (")pc  (1  -  p)n_c 
P  ■  np 

■o  -  np  (1  -  p) 


However,  in  examining  the  number  of  aircraft  killed,  the  binomial  is  no 
longer  fully  justified.  Sampling  is  now  occurring  without  replacement,  hence 
the  appropriate  distribution  to  consider  is  the  hypergeometric  with  its  finite 
population  correction  factor  (N  -n/N  -1): 

fHp  N(1  -  p). 

P(x  -  c)  -  — - -2-ZS - 

0 

P  *  np 

o2  -  np  (1  -  p)  (^— y) 

Fortunately,  both  the  binomial  and  hypergeoraetric  distributions  can  Be 
approximated  by  the  normal  distribution.  The  approximation  improves  for 
increasingly  larger  sample  sizes  and  provides  acceptable  results  conditional 
upon  np  >  5,  n  (1-p)  >  5,  and,  for  the  hypergeometric,  n/N  <  .1. 

For  values  of  np  <  5,  especially  when  p  is  very  small  and  n  is  large,  it 
is  suggested  that  the  Poisson  distribution  be  used  to  approximate  the 
binomial,  with  \  »  np. 


2.  Statistical  Independence 

In  a  discrete  counting  process,  such  as  those  modeled  by  the 
binomial,  hypergeometric,  and  Poisson  distributions,  a  second  important 
theoretical  consideration  involves  the  fundamental  concept  of  independence 
which  is  a  major  assumption  in  statistical  theory.  Two  events  A  and  B  are 
said  to  be  statistically  independent  if  the  occurrence  or  nonoccurrence  of  A 
has  no  efffect  on  the  probability  of  B  and  vice  versa.  The  existence,  or 


.  n  is  the  sample  size 
where  r 

N  is  the  total  population  size 


assumed  existence,  of  independent  Bernoulli  trials  is  another  condition,  in 
addition  to  random  sampling  with  replacement,  that  should  he  met  when  using 
the  binomial  distribution  to  model  a  counting  process.  In  the  JFAA ft  context, 
the  indiscriminate  use  of  the  binomial  distribution,  and  tests  of  proportion 
based  upon  normal  approximations  to  the  binomial,  must  he  guarded  against  be¬ 
cause  the  assumption  of  independence  is  rarely  likely  to  be  valid,  depending 
upon  the  fidelity  of  the  testbed  simulation.  For  example,  the  likelihood  of 
Stinger  Team  B  visually  detecting  an  aircraft  that  has  .lust  overflown  Stinger 
Team  A's  adjacent  position  may  be  considered  to  be  independent  if  communica¬ 
tions  between  the  teams  are  nonexistent.  However,  if  communications  do  exist. 
Team  B's  visual  sighting  probability  becomes  conditional  upon  whether  or  not 
Team  A  visually  detects  the  target  and  transfers  early  warning  information. 
Thus,  Team  B’s  detection  event  is  dependent,  to  some  degree,  upon  Team  A's 
success  in  its  target  detection  event.  This  sort  of  conditional  dependence, 
or  correlation,  among  events,  in  addition  to  the  complexity  of  the  situation 
within  the  JFAAD  region,  is  what  necessitates  performing  replications  of  a 
large  scale  simulation. 

3.  Parametric  and  Nonparametric  Statistics 

Two  general  categories  of  statistical  testing  procedures  are 
available:  parametric  and  nonparametrlc  (or  distribution-free)  tests.  The 
theory  for  classical  parametric  procedures  is  well-established  and  allows  a 
wide  range  of  hypotheses  to  be  tested.  The  use  of  such  tests  is  very  common, 
especially  for  large-scale  experiments  requiring  an  examination  of  the  effects 
of  multiple  independent  varlahles  upon  one  or  more  response  variables  (e.g., 
ANOVA,  MANOVA,  fractional  factorial  designs,  etc.).  Although  ANOVA  techniques 
are  relatively  robust  parametric  tests  they  do  require  several  assumptions, 
some  more  critical  than  others.  When  critical  assumptions  are  not 
sufficiently  met  ANOVA  tests  may  provide  biased,  even  erroneous  results.  In 


contrast,  nonparametric  techniques  require  fewer  assumptions  and.  In  cases 
where  assumptions  essential  for  ANOVA  cannot  he  sufficiently  met,  provide 
unbiased  and  more  powerful  tests  than  their  parametric  equivalent. 

Consequently,  it  appears  desirable  to  develop  a  statistical  analysis  plan 
that  allows  either  approach  to  he  used.  Such  an  approach  encourages  selection 
of  a  particular  test,  either  parametric  or  nonparametric  (or  perhaps  both),  to 
be  based  upon  the  distributional  form  of. the  data  to  be  analyzed.  This 
approach  provides  substantial  flexibility  and  offers  several  advantages: 

-  The  decision  to  use  either  parametric  or  nonparametric  procedures  can 
be  suggested  by  the  data  thus  enabling  selection  of  the  most  powerful 
test  available. 

-  In  those  instances  where  reasons  for  selecting  one  procedure  over  the 
other  are  not  particularly  compelling,  then  both  types  of  tests  can  he 
performed  and  the  results  from  the  two  can  be  compared  and  contrasted; 
perhaps  the  results  will  be  mutually  supportive. 

-  Regardless  of  which  approach  is  ultimately  selected,  both  approaches 
contain  specific  tests  which  answer  essentially  equivalent  hypotheses. 

4.  Statistical  Error  and  Power  Efficiency 

The  two  types  of  statistical  error  that  can  occur  are  illustrated  in 
the  chart  below  (Ref.  2,  pg.  29]: 


The  true 

situation  H0  is  true 

H0  is  false 


The  Decision 

Accept  Hg  Reject  H0 


Correct  decision 
probability  »  1  -  a 

Type  1  error 
probability  =  o 
(level  of  significance) 

Type  II  error 
probability  *  0 

Correct  decision 
probability  =  1  -  0 
(power) 

Figure  1 


Normally,  a  (probability  of  a  Type  I  error)  is  controlled  by  establishing 


a  specific  significance  level  prior  to  performing  the  statistical  test 
(traditionally  a  levels  are  set  at  .01  or  .05).  What  is  often  neglected 
however,  is  8  (probability  of  Type  II  error).  In  many,  if  not  most,  of  the 
JFAAD  issues,  the  commission  of  a  Type  II  error  is  more  severe  than  commission 
of  a  Type  I  error,  Eor  example,  failing  to  detect  a  difference  in  early 
warning  (EW)  techniques  when  in  fact  therg  is  a  difference  (i.e.,  a  Type  II 
error),  can  certainly  be  more  disastrous  than  declaring  that  a  difference 
exists  when  in  fact  there  really  is  none  (a  Type  I  error).  Hence,  while  a  can 
be  controlled  by  pre-selection,  8  is  often  ignored  and,  for  small  sample 
sizes,  may  he  unacceptably  high  even  though  it  is  of  substantially  more 
importance  in  the  context  of  the  JFAAD  issues  and  questions  requiring 
resolution. 

Unfortunately,  a  decrease  in  a  usually  results  in  an  undesirable  increase 
in  0.  For  constant  a  levels,  the  power  (1  -  8)  of  a  statistical  test  can  he 
increased  by  increasing  the  sample  size  of  the  test.  fRef.  2,  pg.  «7]. 


f  (r«l«el  Ha) 


Obviously,  given  a  choice  between  two  statistical  tests  one  would  prefer 


to  choose  that  test  which  achieves  maximum  power  for  a  specific  sample  size 
and  a  level.  Power  efficiency,  or  more  commonly  "asymptotic  relative 
efficiency”  (ARE),  is  a  measure  of  comparison  between  two  tests  (a  and  h)  and 
indicates  which  test  requires  the  smaller  sample  size  (n)  to  achieve  specified 
a  and  power  levels: 

are  -  Llnlt  r— 1 
na+~  na 

Comparisons  between  parametric  and  nonparametric  procedures  have  been 
performed  in  an  effort  to  determine  which  procedure  offers  the  more  powerful 
(or  equivalently,  more  efficient)  test  results.  These  results  are  provided 
below.  [Ref.  9,  pg.  87]: 
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Median  test 
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Figure  3 


Due  to  the  distribution-free  assumption,  exact  power  calculations  for 
nonparametric  2-sample  and  many-sample  tests  are  not  available.  However,  for 
parametric  ANOVA,  procedures  and  tables  exist  to  determine  not  only  the  power 
of  a  particular  test,  but  more  importantly,  sample  size  necessary  to  achieve  a 
desired  power  level  prior  to  conducting  the  test.  Such  an  investigation  will 
aid  immeasurably  in  developing  a  more  powerful  experimental  design.  Refer  to 
Annex  C . 

5.  Which  Approach?  A  Comparison  of  Parametric  vs.  Nonparametric 
Statistical  Tests 

In  an  attempt  to  compare  and  contrast  the  relative  advantages  and 
disadvantages  of  parametric  with  nonparametric  statistics,  a  thorough  analysis 
of  the  assumptions  required  for  each  approach  is  necessary.  Especially 
valuable  is  a  recognition  of  the  effects  if  any,  violations  of  such 
assumptions  will  have.  Such  an  analysis  should  prove  beneficial  in  the 
attempt  to  identify  the  "best"  statistical  test  for  a  particular  hypothesis. 

In  general  violations  of  assumptions  In  parametric  tests  affect  both  the 
sensitivity  and  the  significance  level  of  the  test.  For  example,  violations 
of  assumptions  in  the  one-way  ANOVA  usually  cause  the  F  test  to  become  less 
efficient  in  detecting  differences  and  to  announce  too  many  significant 
differences.  Appropriate  parametric  tests  (e.g.,  t-tests  and  one-way  ANOVA) 
designed  to  detect  differences  among  alternative  treatments  (e.g.,  different 
types  of  early  warning  or  identification  procedures)  require  data  from  the 
treatment  groups  to  be: 

-  normally  distributed, 

-  homoscesdastic  (of  equal  variance),  and 

-  independent  among  experimental  units. 


The  second  assumption  requiring  constant  variance  is  analogous  to  asserting 
that  the  means  and  variances  of  the  treatments  must  be  independent.  Although 
data  expressed  as  proportions,  or  percentages,  violate  both  the  normality 
(unless  the  sample  size  is  sufficiently  large  to  invoke  the  Central  Limit 
Theorem)  and  constant  variance  assumptions  (e.g.,  for  the  binomial 
distribution,  the  variance,  o^  -  npq  is  a  function  of  the  mean  *  np),  the 
arcsin,  or  angular,  transformation  is  regarded  as  an  appropriate  way  to 
convert  the  data  such  that  parametric  tests  then  become  theoretically 
plausible  even  for  small  sample  sizes: 


!  Transform:  I 

P'  ■  2  *«“•  -^-S. 

I  (See  Annex  A)  I 


Figure  4 

The  effect  of  data  transformation,  which  corrects  for  the  original  lack  of 
normality  and  variance  stability  in  the  data,  coupled  with  the  known 
"robustness"  of  the  variance-ratio  F  tests  used  in  the  one-way  ANOVA,  then 
allow  such  parametric  tests  to  be  used  without  fear  of  attaining  erroneous 
results  (See  also,  The  Analysis  of  Binary  Data,  by  O.R.  Cox,  on  use  of  the 
empirical  logistic  transform). 

However,  when  advancing  beyond  the  one-way  ANOVA  to  the  higher-order 
factorial  designs  which  are  the  commonly  used  parametric  tests  for  large-scale 
experiments  (e.g.,  two-way  ANOVA,  blocked  designs,  fractional  factorials, 
etc.),  an  additional,  and  much  more  critical  assumption  must  he  considered: 

The  requlrraent  for  additivity  of  treatment  and  interaction  effects  in  a  linear 
model.  In  the  case  of  the  two-way  ANOVA  with  multiple  observations  per  cell. 


the  linear  model  is: 


xi.1k  -  U  +  Tj.  +  0.1  +  *1.1  +  ei.1k 

where  the  assumption  is  made  that  each  observation  may  be  expressed  as  the 
algebraic  sura  of: 

1.  an  overall  mean,  u 

2.  a  "row  effect",  Tj 

3.  a  "column  effect",  0^ 

4.  an  interaction  effect,  tj'ij,  and. 

5.  an  experimental  error  (residual),  ~  N(0,o2) 

The  additional  assumption  involved,  under  the  null  hypotheses,  is  that  of 
additivity  among  all  effect  terms.  In  the  case  of  factorial  type  experiments, 
the  assumption  of  additivity  is  rarely  realistic  as  evidenced  by  the  numerous 
interaction  terms  that  are  realistically  significant.  In  the  case  of  the 
two-way  ANOVA,  If  the  interaction  term  Is  significant,  one  may  conclude  that 
row  and  column  factors  are  affecting  the  observations,  and  explore  these 
effects  through  other  procedures  (e.g.,  one-way  ANOVA's  on  each  row  and 
column).  A  significant  interaction  suggests  that  the  effects  of  row  and 
column  effects  are  not  additive,  and  accordingly  the  two-way  ANOVA  model  is  no 
longer  appropriate  In  terras  of  testing  for  pure  row  and  column  effects.  The 
magnitude  of  the  errors  that  would  result  from  continuing  to  test  for  row  and 
column  effects  is  not  known.  In  practice  such  tests  are  frequently  performed 
even  though  It  is  known  that  the  additive  model  is  incorrect.  One  should 
always  test  interaction  terras  first:  If  interaction  terra(s)  are 
insignificant,  then  a  pooled  mean  square  error  terra  can  be  used  to  test  for 
main  effects;  if  interactions  are  significant,  then  only  those  main  effects 
not  involved  in  significant  interactions  should  subsequently  be  tested. 
Transformations  also  exist  to  reduce  the  effects  of  nonadd J.tivity  (so-called 
"transformable  non-additivity").  However  the  art  of  transforming  data  in  an 


effort  to  achieve  additivity  is  far  less  developed  than  similar  methods  used 
to  achieve  normality  and  variance  stabilization.  Such  attempts  might  he 
practical  for  small  factorial  designs  (such  as  a  two-way  ANOVA  with  only  one 
interaction  terra)  but  rapidly  become  impractical  for  large  factorial  designs 
involving  numerous  interaction  terras. 

Many  of  the  assumptions  required  by  parametric  ANOVA  appear  to  be 
difficult,  at  best,  to  meet: 

1.  Continuity  of  data  (much  of  the  data  is  binary  giving  rise  to 
the  analysis  or  proportions), 

2.  Normality  and  variance  stability  (proportions  from  a  binomial 
distribution  are  normally  distributed  in  the  limit  only,  and  the 
binomial  distribution  exhibits  dependence  between  the  mean  and 
its  variance). 

3.  Additivity  in  the  linear  model  (numerous  interactions  can 
realistically  be  expected  thus  decreasing  precision  in  attempts 
to  measure  differences  in  the  main  effects  which  are  of  primary 
concern  to  the  JFAAD  Issues),  and 

4.  Confidence  intervals  on  estimates  can  be  decreased  only  by 
increasing  the  sample  size  (large  sample  sizes  may  be  cost 
prohibitive). 

An  alternative  to  the  parametric  approach  is  offered  by  various 
nonparametric  techniques.  The  major  advantage  offered  by  nonparametrics  is 
the  lack  of  any  distributional  assumptions,  hence  the  phrase  "distribution- 
free"  and  "assumption-free"  statistics.  The  major  disadvantage  of  most  of  the 
nonparametric  tests,  with  the  exception  of  those  based  upon  "normal  scores",  is 
the  small  power  relative  to  comparable  parametric  tests  since  only  part  of  the 
information  contained  in  the  data  (usually  based  upon  ranks)  is  utilized  in 
the  statistical  decision.  Unlike  the  relative  plethora 


of  parametric  ANOVAs  that  abound  in  experimental  design,  nonparametric 
procedures  have  not  been  developed  to  support  analysis  of  large  factorial 
designs.  However,  as  the  ARE  entries  in  Figure  3  show,  some  of  the  one-way 
ANOVA  equivalents  (2-sample  and  many-sample  tests)  possess  AREs  with  lower 
hounds  of  1.0.  This  indicates  that  these  nonparametric  tests  have  the  same 
asymptotic  efficiency  as  their  parametric  counterparts  when  the  population  is 
really  normal  and  even  larger  asymptotic  eff-iciences  when  the  population  is 
non-normal.  Thus,  when  normality  assumptions  cannot  he  satisfied  these  tests 
provide  more  powerful  results  than  comparable  parametric  tests  such  as  the 
t-  and  E-tests. 

B.  ANALYSIS  METHODOLOGY 

Although  a  large  factorial  design  approach  with  multiple  (>  2)  levels  for 
each  treatment  factor  (MOE  and  MOP)  appears  to  offer  a  comprehensive  and 
efficient  "macroscopic"  view  of  all  the  factors  involved  in  an  entire  issue, 
or  even  multiple  issues,  such  an  approach  is  conditional  upon  the  validity  of 
the  previously  discussed  assumptions  involved  in  any  parametric  experimental 
design,  especially  the  assumption  of  linear  additivity.  The  large  quantity  of 
interaction  terras,  many  of  which  should  realistically  reveal  themselves  to  be 
statistically  significant,  jeopardize  precise  judgements  of  the  main  effects. 
Additionally,  providing  a  reasonable  interpretation  to  the  meaning  of  many  of 
the  higher-order  Interaction  terms  may  prove  futile;  for  example,  the  highest 
order  interaction  terra  in  a  factorial  design  for  the  identification  issue  will 
be  a  cross  product  terra  involving  seven  elements  (since  there  are  seven 
factors  in  the  ID  issue  ranging  from  the  flight  profile  MOP  up  through  the 
dendrite  to  the  in  system  issue  at  the  top  of  the  pattern  of  analysis).  As  a 
consequence  of  the  vast  amount  of  "background  noise"  (significant 


Interactions)  It  becomes  virtually  impossible  to  detect  any  main  effect 
differences,  even  If  they  do  exist.  The  magnitude  of  the  experimental  design 
consequently  becomes  counterproductive  to  the  statistical  analysis  effort 
because  the  main  effect  factors  (MOP  and  MOF)  have  become  Inextricably 
entwined  in  such  a  large  number  of  interactions  (recall  that  any  main  effect 
Involved  in  any  significant  interaction  term  cannot  subsequently  be  tested 
with  precision  for  possible  differences  among  its  levels). 

The  large  factorial  design's  inability  to  address  the  crucial  main 
effects  questions  with  any  precision  clearly  argues  for  a  more  satisfactory 
approach  in  the  analysis  effort.  An  alternative  approach,  admittedly 
"microscopic"  and  thus  more  tedious,  involves  sequential  iterations  of  two-way 
and  one-way  ANOVAs  which  successively  examine  each  layer  of  MOPs  and  MOFs 
within  the  pattern  of  analysis  for  each  major  issue.  This  methodical  approach 
allows  main  effects  (the  MOEs  and  MOPs  at  each  "tier”  in  the  pattern  of 
analysis)  to  be  addressed  with  much  greater  precision  and  for  confidence  hands 
to  be  established  using  multiple  comparison  tests  in  those  Instances  where 
significant  main  effects  are  identified.  When  interactions  between  progres¬ 
sively  lower  MOPs  are  identified,  differences  within  each  MOP  can  still 
be  explored  through  one-way  ANOVAs  with  precision.  Hence,  the  problems  posed 
by  significant  interaction  terms  (which  precluded  tests  on  main  effects  in  the 
large  factorial  design)  can  now  be  circumvented.  Perhaps  the  greatest  benefit 
offered  by  this  "one-step-at-a-time"  approach,  is  the  opportunity  to  also 
employ  nonparametric  tests.  If  data  analysis  reveals  that  parametric  ANOVA 
assumptions  are  questionable  or  clearly  Invalid,  the  alternative  use  of 
nonparametric  analogues  for  one-way  ANOVAs  now  exists.  In  fact,  the  high  ARF 
for  the  2-  and  many-sample  comparison  tests  (refer  to  Figuye  3)  strongly 
encourages  their  use  regardless  of  the  adequacy  of  comparable  parametric 
tests.  This  approach,  illustrated  below  in  "generic"  form,  allows  either 


classical  parametric  ANOVA,  nonparametrlc  tests  of  Independence  (which  are 
equivalent  to  "Independence,"  or  lack  of  Interaction,  between  row  and  column 
effects  In  the  two-way  ANOVA)  and  tests  of  comparison,  or  both  parametric  and 
nonparametrlc  tests  to  be  used. 
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Figure  5 


This  approach  appears  to  provide  the  opportunity  to  use  both  parametric  and 
nonparametrlc  statistics: 

1.  The  use  of  known  data  transformations  (e.g.,  logistic 
and  angular  transforms)  provide  a  promising  means  to  par¬ 
tially  compensate  for  small  sample  sizes  while  simul¬ 
taneously  converting  the  data  such  that  It  sufficiently 
approximates  the  parametric  assumptions  of  continuity, 
normality,  variance  stability,  and  linear  additivity. 

2.  The  high  ARE  nonparametrlc,  multi-sample  tests  become 
available  for  use  and  provide  test  results  at  least  as 


powerful  as  parametric  ANOVA. 


An  appealing  attribute  of  this  approach,  not  available  in  large  scale 
parametric  designs,  is  the  opportunity  to  compare  and  contrast  decisions  on 
statistical  hypotheses  obtained  from  both  parametric  and  nonparametric  statistical 
tests.  When  both  types  of  tests  can  be  used  with  sound  theoretical  lustif ication, 
identical  decisions  to  either  reject  or  not  reject  a  null  hypothesis  will  provide 
mutual  reinforcement  to  each  other  lending  greater  credibility  to  the  accuracy  of 
the  decision.  When  the  two  procedures  yie.ld  different  results,  the  reason  for  the 
disparity  can  be  examined  in  an  attempt  to  determine  which  result  provides  greater 
credibility  (e.g.,  a  significant  difference  in  a  Kruskal-Wallace  test  may  conflict 
with  a  non-significant  main  effect  result  from  a  two-way  ANOVA  test;  closer 
scrutiny  may  reveal  a  marginally  non-significant  interaction  term  which  suggests 
that  the  additivity  is  questionable  in  the  two-way  ANOVA,  hence  greater  credibility 
should  be  attached  to  the  Kruskal-Wallace  results  than  those  obtained  from  ANOVA). 

Specific  statistical  tests  for  both  parametric  and  nonparametric  analysis 
are  outlined  in  flow  diagram  form  on  the  following  pages.  The  statistical  test 
logic  is  applicable  to  all  three  patterns  of  analysis. 
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TOP-DOWN,  SEQUENTIAL  ANALYSIS  OF  VARIANCE  APPROACH  APPLIED 
TO  THE  IDENTIFICATION  ISSUE  PATTERN  OF  ANALYSIS 
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Pattern  of  Analysis  for  Direct  Identification  System. 
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ANALYSIS  PLAN:  A 

1.  Which  ID  system  type  (indirect  or  direct)  Rills  more  hostile 
A/C? 

2.  Which  ID  system  results  in  lowest  fratracide? 


Techniques  outlined  in 
II.'B  are  directly  applicable 
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ANALYSIS  PLAN;  C 

Questions : 

1.  Are  different  AD  systems  more  effective  against  rotary  wing 
than  fixed  wing?  (interaction) 

2.  Is  there  an  overall  difference  between  A/C  MOP  for  each  ID 
system? 

3.  Which  AD  systems  contribute  most  to  the  MOEs? 

4.  Does  ID  system  effect  AD  system  performance?  If  so,  how? 
(e.g.,  do  SHORAD  systems  account  for  more,  or  less,  of  the 
MOEs?)  (cross  system  investigation) 
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ANALYSIS  PLAN:  D 


Questions: 

1.  Is  visual  ID  more  or  less  effective  than  electronic  ID  for 
SHORAD  systems? 

2.  Should  IFF  be  allowed  for  fixed  wing  engagements  but  visual 
ID  required  for  rotary?  (within  ID  Issue  investigation) 

3.  Does  electronic  ID  significantly  reduce  fratracide  for  non¬ 
radar  assisted  systems?  (cross  system  investigation) 
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Techniques  outlined  in 
II.'B  are  directly  applicable 
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ANALYSIS  PLAN:  E 

1.  Does  the  degree  of  early  warning  have  any  effect  upon  A/C 
identification?  (interaction) 

2.  Are  there  differences  among  ID  techniques?  EW? 

3.  If  an  indirect  ID  procedure  is  used  by  visually  sighted 
SHORAD  systems,  does  cueing  significantly  improve  the  MOEs? 
(cross  system  investigation) 


Techniques  outlined  in 

II. B  are' directly  applicable 
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ANALYSIS  PLAN:  F 

1.  Does  the  amount  of  early  warning  make  any  difference  across 
different  flight  profiles?  (interaction) 

2.  What  effect  does  EW  have?  Is  cueing  substantially  better  than 

alerting  information? 

3.  Do  flight  profiles  have  any  effect  upon  the  MOEs? 


Techniques  outlined  in 
II.'B  are  directly  applicable 


AN  EXAMPLE  OF  CROSS  SYSTEM  INVESTIGATION  OF  LOWER  LEVEL  MOP  EFFECTS 
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Questions: 

1.  Does  EW  type  have  an  effect  on  ID  system  performance? 
(interaction  significant! 

2.  Within  each  ID  system,  are  there  differences  in  fratracide 
and  Z  hostile  killed? 

3.  Which  EW  technique  is  better? 


Techniques  outlined  in 
II.'B  are  directly  applicable 


AN  EXAMPLE  OF  CROSS  ISSUE  INVESTIGATION  OF  MOEs 
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Questions : 

1.  Does  the  effectiveness  of  ID  procedures  vary  among 
different  ASM  plans?  (interaction) 

2.  Should  the  ID  procedures  be  standardized  across  all 

ASM  plans  or  should  specific  ID  procedures  be  predicated 
upon  the  ASM  plan  in  effect? 


t  Techniques  outlined  in 
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III.  observations,  findings  and  recommendations 


I  have  divided  my  concluding  comments  into  three  categories  as  follows: 

1.  Observations  are  offered  in  the  form  of  major  concerns  I  would  have  if  I 
were  a  full-fledged  member  of  the  JFAAD  Test  Force.  These  concerns  evolved  as  I 
gradually  became  more  acquainted  with  JFAAD  and,  in  particular,  the  analysis  plan. 
As  always,  context  is  a  crucial  consideration  and  these  few  observations  are  based 
upon  the  "context"  of  my  experience,  to  wi.t: 

a)  All  of  my  tactical  air  defense  experience  has  been  devoted  to 
improving  effectiveness  at  and  below  battery  level,  especially  fire 
unit  procedures.  Consequently,  I  doubt  if  I  fully  appreciate  the 
"aggregate"  problems  that  must  be  contended  with  at  higher  command/ 
staff  levels. 

b)  My  understanding  of  the  actual  JFAAD  testbed  simulation  process  is 
incomplete;  specifically,  how  are  the  distributions  for  the  input 
variables  for  each  of  the  required  events  obtained?  (Ref:  Chapter 
3,  JFAAD  TPD) 

2 .  Findings  are  presented  which  summarize  the  results  of  my  research 
effort. 

3.  Recommendations  are  offered  for  consideration. 


A.  OBSERVATIONS 


1.  My  Initial  impression  has  only  been  reinforced  as  I've  attempted  to  "get 
a  handle"  on  the  total  JFAAD  Test  Force  effort.  This  is  a  massive, ambitious 
undertaking  that,  at  once,  due  to  its  magnitude  and  complexity  appears  nearly 
impossible  to  resolve  and  yet,  for  the  same  reasons,  demands  such  a  resolution.  Tf 
anything,  the  attempt  to  seek  a  resolution  is  long  overdue  and,  regardless  of  the 
eventual  outcome,  justifies  the  awesome  effort  required  despite  the  Intimidating 
and  seemingly  Insurmountable  challenge  it  presents. 

2.  I  am  convinced  that  the  greatest  danger  toward  realization  of  the  full 
potential  for  JFAAD  success  lies  in  a  testbed  simulation  (or  "model")  that  posses¬ 
ses  insufficient  model  fidelity  to  accurately  capture  fundamental  fire  unit  level 
(crew  level)  activities  and  events  that,  when  aggregated,  could  very  well  yield 
profound  (and  not  necessarily  intuitive)  results.  For  example;  1)  the  type  of 
search  and  scan  pattern  employed  by  visually-directed  FAAD  crews  has  a  tremendous 
effect  upon  target  detection  range  and  time  to  detection,  2)  VACR  probabilities  are 
heavily  dependent  upon  both  type  and  number  of  aircraft  that  will  be  operating  in 
the  FAAD  region,  and  3)  in  many  FAAD  fire  unit  situations  cueing  may  actually  be 
counterproductive  to  the  target  detection  effort.  In  this  regard,  I  think  there 
are  procedural  "crew-drill"  type  activities  that  will  actually  have  a  greater  im¬ 
pact  upon  the  MOEs  of  fratracide  and  hostile  aircraft  kills  than,  for  example,  sys¬ 
tem  level  issues  such  as  EMSCS  vs.  Objective  or  different  ASM  plans. 


B.  FINDINGS 


1.  My  initial  task  was  to  manually  derive  "dummy"  data,  then  "roll  up"  the 
data  through  successively  higher  MOP/MOF  levels  for  each  of  the  three  patterns  of 
analysis.  This  was  accomplished,  for  the  most  part,  without  difficulty  and  shows 
that  the  hierarchical  patterns  of  analysis  support  "bottom-up”  data  aggregation 
and,  consequently,  a  "top-down"  data  analysis  approach  such  as  the  one  detailed  in 
this  paper,  or  a  large-scale  factorial  design  as  well.  I  believe  that  experience 
allowed  me  to  create  fairly  reasonable  "dummy"  data  in  most  cases.  The  one  area 
where  I  felt  completely  incapable  of  providing  semi-realistic  data  involved 
communication  nodal  performance  for  the  C^I  Analysis  Plan  (pg  B-16  of  JFAAD  TPD). 
While  there  is  obviously  no  problem  in  rolling-up  from  Table  B-l  to  Table  R-2 
(since  B-2  is  lust  a  summary  of  B-l),  I  mention  this  because  my  brief  research  into 
the  commo  field  while  at  JFAAD  (in  an  attempt  to  get  realistic  data)  left  me  with 
the  impression  that  there  exists  a  rather  large  gap  in  empirical  data  for  this 
issue  (e.g.,  sensor  transmission  rates,  reliability  at  the  oeprator-machine 
interface,  realistic  delay  times,  etc.).  Thus,  it  would  appear  difficult  to 
evaluate  alternative  communication  linkages,  the  effects  of  sensor  netting  options, 
and  other  excursion  options  without  such  data  to  "drive"  the  C’l  MOP  in  such  a 
manner  as  to  provide  reasonably  accurate  results  to  compare  and  contrast 
alternatives. 

2.  The  major  portion  of  ray  research  effort  was  spent  in  an  attempt  to  find 
"the  best"  statistical  approach  for  data  analysis.  This  entailed  an  examination  of 
the  theoretical  assumptions  that  underly  many  of  the  "usual"  tests  that,  at  first 
glance,  may  appear  appropriate.  To  summarize  the  major  results  of  my  research 


efforts 


a)  Tests  of  proportions  must  be  used  with  caution  since  the  binomial 
distribution,  requiring  independent  trials  and  random  sampling  with  replacement, 
will  rarely  serve  as  an  appropriate  probability  model. 

b)  The  occurrence  of  statistical  error  reveals  that,  in  many  cases.  Type  II 
error  is  more  Important  to  control  than  Type  I.  Hence  the  selection  of  a  high 
power  level  (1-0)  may  be  more  important  than  a  small  signficiance  level  (a).  This 
has  special  signficance  for  small  sample  sizes. 

c)  Large  scale  factorial  designs  are  excellent  for  determining  interactions 
and  measuring  the  magnitude  of  interactions,  however  they  preclude  an  examination 
of  main  effects. 

d)  Nonparametric  tests  are  available  that  exhibit  greater  ARK  than 
comparable  parametric  ANOVA. 

e)  An  analysis  of  the  advantages  and  disadvantages  of  both  parametric  and 
nonparametric  statistical  tests  encourages  the  use  of  a  dual-approach  methodology, 
allowing  the  choice  of  a  specific  test  or  tests  to  ultimately  be  suggested  by  the 
data. 

f)  Such  a  statistical  methodology  allows  a  flexible  "top-down"  approach  to 
investigate  each  major  issue.  Furthermore  it  also  allows  cross-investigation  of 
lower-level  MOPs  within  different  systems  of  a  particular  issue  and  also  allows 
cross-investigation  of  MOEs  among  different  tests  conditions  (e.g.,  scenario 
location,  air  environment,  EW  environment,  and  visibility  conditions). 


C.  RECOMMENDATIONS 


1.  Software.  Presumably,  data  generated  from  the  testbed  simulation  will  be 
"tagged”  and  stored  in  a  database  for  subsequent  retrieval.  Pence  there  is  a  need 
for  an  effective  data  retrieval  system  that  allows  efficient  recovery  of  the 
specific  data  to  be  analyzed.  A  data  retrieval  system  capable  of  directly 
interfacing  with  a  statistical  graphics  package  would  appear  to  offer  a  flexible 
statistical  analysis  package.  The  statistical  routines  needed  would  obviously 
depend  upon  the  type  of  analysis  ultimately  used.  SPSS  routines  could  be  used  for 
a  large-scale  factorial  experimental  design,  or  OA  3660  (even  MINITAR)  could  be 
used  to  automate  the  methodology  I've  proposed.  The  purpose  of  the  graphics 
package  would  be  to  examine  the  data  in  an  effort  to  ascertain  distributional  form 
thus  aiding  the  selection  of  appropriate  statistical  tests,  examining  the  effects  of 
transformations,  and  presentation  of  results. 

2.  Statistical  Analysis.  The  "dual-approach"  methodology  detailed  in  this 
paper  Is  recommended  as  It  appears  to  offer  a  flexible,  yet  powerful  and 
theoretically  sound  approach,  capable  of  providing  detailed  answers  pertinent  to 
all  facets  of  the  JFAAD  tpd.  This  approach  enables  advantages  offered  by  both 
parametric  and  nonparametrlc  statistics  to  be  capitalized  upon,  thus  enhancing  the 
credibility  of  test  results. 

3.  Additional  Research.  Toward  the  end  of  my  research  effort,  I  stumbled  ac¬ 
ross  yet  another  approach  that  appears  to  offer  a  promising  alternative.  tech¬ 

nique  is  referred  to  as  analysis  of  "discrete  multivariate  data”,  "multivariate  bin¬ 
ary  data",  and  "cross-classified  categorical  data"  in  different  references.  Mhile 
the  theory  is  relatively  advanced  and  I  was  not  able  to  pursue  the  sub.lect  exten¬ 
sively,  it  involves  the  analysis  of  multi-dimensional  contingency  tables  (cell  ent¬ 
ries  are  count  data)  using  a  hierarchical  loglinear  model.  The  resultant  model, 
developed  using  an  Iterative  proportional  fitting  procedure  to  compute  maximum  like- 
lihhod  estimates  (MLEs)  for  each  cell  value,  appears  to  be  analogous  to  factorial  ANOVA 


in  both  functional  form  and  interpretation.  Although  the  development  of  the  model 
appears  to  be  quite  a  lengthy  and  tedious  process,  it  fortunately  has  been 
computerized.  The  Operational  Test  and  Fvaluation  Agency  (OTEA')  has  and  uses  the 
computer  algorithms  for  large-scale  loglinear  models.  I  strongly  encourage  further 
investigation  into  this  technique,  especially  if  the  software  can  be  transported  to 
JFAAD  from  OTEA  or  if  the  JFAAD  data  can  be  transported  to  OTEA  for  analysis 
there. 

4.  References.  I  found  the  following  texts  to  be  particularly  good  in  their 
respective  fields  and  recommend  them  as  valuable  references? 

Nonparametric  and  Distribution  -  Free  Methods  for  the  Social  Sciences,  by 
L.  A.  Marascuilo  and  M.  McSweeney. 

The  Analysis  of  Cross-Classified  Categorial  Data,  by  S.  E.  Fienherg 

Discrete  Multivariate  Analysis;  Theory  and  Practice,  by  Y.  M.  M.  Bishop, 
et.  al. 

Interactive  Data  Analysis,  by  D.  R.  McNeil 


ANNEX  A 


DATA  TRANSFORMATION 

The  special  problems  that  surface  when  attempting  to  analyze  proportions,  or 
percentages,  derived  from  a  binary  response  variable  can  be  remedied  through 
various  data  transformations.  The  most  commonly  used  are: 

P 

1.  logistic,  log  (-JTp) 

2.  linear,  P 

3.  integrated  normal  (probit) 

4.  arcsin,  2  arcsin  vr 

Cox  (in  The  Analysis  of  Binary  Data)  concludes  that  all  four  are  in 
reasonable  agreement  as  long  as  0.1  <  p  <  0.9.  The  primary  advantage  in  using  the 
arcsin,  or  angular,  transformation  is  its  ability  to  remove  variance  dependence 
upon  the  mean  in  a  binomial-type  variable.  The  transformation: 

■  2  arcsin  4Ti  0<Y<1 

yields  the  following  tabulated  results  (Y  is  used  to  denote  the  proportion  P): 
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MULTIPLE  COMPARISON  (POST-HOC)  TESTS 


A.  PARAMETRIC  MULTIPLE  COMPARISON  TESTS: 

In  Chose  instances  when  a  null  hypothesis  for  a  treatment  is  rejected  (i.e., 
differences  among  levels  of  a  treatment  do  exist),  it  does  not  necessarily  follow 
that  each  level  is  different  from  the  others.  Hence,  once  a  null  hypothesis  is 
rejected,  it  becomes  extremely  important  in  the  .JFAAD  context  to  determine  which 
levels  are  different  and  which  are  not  (e.g.,  differences  in  the  EW  MOP  between 
Alerting,  Cueing,  and  no  EW).  Various  multiple  comparison  tests  have  been 
developed,  including  (see  Reference  A,  pp  262-271  and  Reference  10,  pp.  233-23R): 

1.  Fisher's  Least  Significant  Difference 

2.  Duncan's  New  Multiple  Range  Test 

3.  The  Student  -  Newman  -  Keuls*  Procedure 
A.  Tukey's  Honestly  Significant  Difference 
5.  Scheffe's  Method 

The  choice  of  a  particular  test  is  dependent  upon  which  type  of  error  (Type 

I  or  Type  II)  Is  more  serious.  In  most  cases  it  is  desireahle  to  reduce  the  Type 

II  error  (i.e.,  if  differences  do  exist  among  levels  of  a  particular  MOP,  it  is 
important  that  such  differences  actually  be  detected  by  the  test;  a  test  is  needed 
with  high  power).  Thus,  in  the  JFAAD  context,  Fisher's  Test  will  he  appropriate  to 
test  for  differences  among  MOP  levels  when  a  MOP  factor  has  been  declared 
significant  by  ANOVA  procedures. 

Comparison  of  Multiple  Comparison  Procedures 
Multiple  Comparison 

Procedure  Power  Type  I  Error  Rate 

pisher's 
Duncan's 

Student- Newman- Keuls' 

Tukey’s 
Scheffe’s 


Highest  Highest 

Mini*  utnu-ivitlHt.  T  Mote  like!)  lo 

U'*a  likcl)  lo  Urlcst  I  imlttaic  faK* 

fl‘.tl  difkKMW*.  I  lllfltf fences 

y 


Lowest 


Lowest 


B.  NONPARAMETRIC  MULTIPLE  COMPARISON  TESTS 


Multiple  comparison  tests  are  also  available  for  the  nonpararaetric  rank  and 
normal-scores  tests.  These  so  called  "post-hoc"  tests,  can  he  found  on  the  below 
listed  pages  of  the  following  references: 


Test 

Conover  (Ref.  2) 

Marascuilo  and 
McSweeney  fRef.  9) 

Kruskal-Wallace 

231  . 

306-310 

Quade 

297 

- 

Friedman 

300 

362-366 

Vander  Waerden 

319 

405-414 

ANNEX  C 


POWER  CALC!JLATIONS  IN  ANOVA 

As  discussed  earlier,  improving  power  (1-0)  in  ANOVA  can  he  accomplished  by 

either: 

1.  Increasing  a  (probability  of  Type  I  error)  which  can  be  accomplished  by 
pre-selecting  larger  a  values,  for  example  a  ■  .1  instead  of  .05  or  .01,  or 

2.  Increasing  the  sample  size  for  a  constant  a  level. 

Knowledge  of  power  is  useful  not  only  for  assessing  the  discriminating 
ability  of  the  test,  but  also  for  determining  the  necessary  sample  size  to  achieve 
desired  level  of  power.  Detailed  power  calculations  are  provided  in  Reference  5, 
pp.  615-619,  660  and  Reference  R,  pp.  142-145.  Power  function  curves  (Operating 
Characteristic  -  "OC  Curves")  have  been  developed  and  can  be  used  to  assist 
determination  of  sample  size  (see,  for  example,  Figure  29.1,  Reference  5,  p.  617). 


However,  a  simple  procedure,  requiring  no  a  priori  information  of  sample 


variance,  is  available  simply  by  specifying  a,  the  desired  power  (1-0),  p  (the 
number  of  levels  (columns)  in  a  1-Way  ANOVA),  and  C  (which  is  a  multiple  of  the 
unknown  residual  error,  oe)  and  using  Table  E.15  of  Reference  R  (pp.  R40-R41), 
reproduced  on  the  next  page. 


This  technique  should  prove  very  valuable  in  estimation  of  minimum  sample 


sizes  necessary  to  achieve  the  desired  level  of  power.  It  illustrates  the 
tradeoffs  between  a  and  0,  as  well  as  the  differential  improvement  that  can  be 
gained  in  power  by  increasing  the  sample  size  for  a  given  test. 
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